Search CORE

25 research outputs found

Methods for Answer Extraction in Textual Question Answering

Author: Aunimo Lili
Publication venue: 'University of Helsinki Libraries'
Publication date: 12/06/2007
Field of study

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.Tekstuaalinen kysymysvastausjärjestelmä on tietokoneohjelma, joka vastaa käyttäjän esittämiin kysymyksiin tekstidokumenteista eristämillään vastauksilla. Tekstuaaliset kysymysvastausjärjestelmät ovat tärkeä tutkimusongelma, sillä digitaalisessa muodossa olevien tekstidokumenttien määrä lisääntyy jatkuvasti. Samalla kasvaa myös sellaisten tiedonhakumenetelmien tarve, joiden avulla käyttäjä löytää tekstidokumenteista olleellisen tiedon nopeasti ja helposti. Kysymysvastausjärjestelmiä on tutkittu jo 1960-luvulta alkaen. Ensimmäiset järjestelmät osasivat vastata suppeaan joukkoon määrämuotoisia kysymyksiä, jotka koskivat jotakin tarkasti rajattua aihepiiriä kuten pesäpallotuloksia. Nykyään kysymysvastausjärjestelmien tutkimuksessa keskitytään järjestelmiin, joissa kysymykset voivat olla melko vapaasti muotoiltuja ja ne voivat liittyä mihin tahansa aihepiiriin. Nykyjärjestelmissä tiedonhaku kohdistuu usein laajoihin tekstidokumenttikokoelmiin kuten WWW:hen ja sanomalehtien uutisarkistoihin. Toisaalta myös rajatun aihepiirin järjestelmät ovat yhä tärkeä tutkimuskohde. Käytännön esimerkkejä rajatun aihepiirin järjestelmistä ovat yritysten asiakaspalvelua helpottavat järjestelmät. Nämä järjestelmät käsittelevät automaattisesti osan asiakkaiden yritykselle osoittamista kysymyksistä tai toimivat asiakasneuvojan apuvälineenä hänen etsiessään tietoa asiakkaan kysymykseen. Tässä väitöskirjassa kehitetyt menetelmät ovat sovellettavissa sekä avoimen että rajatun aihepiirin kysymysvastausjärjestelmiin. Väitöskirjassa on kehitetty kaksi uutta menetelmää vastausten eristämiseksi tekstistä ja tekstuaalinen kysymysvastausjärjestelmä, joka käyttää molempia menetelmiä. Menetelmät on arvioitu julkisesti saatavilla olevalla testidatalla. Väitöskirjassa kehitetyt vastauksen eristämismenetelmät ovat oppivia. Oppivuudella tarkoitetaan sitä, että vastausten eristämiseen käytettäviä hahmoja ei tarvitse ohjelmoida, vaan ne tuotetaan automaattisesti esimerkkidatan perusteella. Oppivuudella tehostetaan uusien kysymysvastausjärjestelmien kehittämistä. Tehokas järjestelmäkehitys on erityisen tärkeää silloin kun järjestelmästä tarvitaan useita kieliversioita. Myös uusien kysymys- ja tekstityyppien lisääminen järjestelmään helpottuu oppivan menetelmän ansiosta

Helsingin yliopiston digitaalinen arkisto

Tekstifragmenttien välisen semanttisen samanlaisuuden tunnistaminen

Author: Aunimo Lili
Publication venue: Helsingin yliopisto
Publication date: 01/01/2002
Field of study

Helsingin yliopiston digitaalinen arkisto

The Multilingual Question Answering Track at CLEF

Author: Aunimo Lili
Ayache Christelle
Giampiccolo Danilo
Magnini Bernardo
Osenova Petya
Peñas Anselmo
Rijke Maarten de
Sacaleanu Bogdan
Santos Diana
Sutcliffe Richard
Publication venue
Publication date: 06/11/2008
Field of study

Repositório Comum

Overview of the CLEF 2005 Multilingual Question Answering Track

Author: Aunimo Lili
Ayache Christelle
Giampiccolo Danilo
Magnini Bernardo
Osenova Petya
Peñas Anselmo
Rijke Maarten de
Sacaleanu Bogdan
Santos Diana
Sutcliffe Richard
Vallin Alessandro
Publication venue: Centromedia
Publication date: 13/10/2009
Field of study

Repositório Comum

KP-LAB Knowledge Practices Laboratory -- External release of end-user applications

deliverablesThis deliverable describes the M24 release of the End user applications for knowledge practices software v2.0.0. The deliverable includes the technical development performed until M24 (January 2008) within WP6 according to Description of Work 2.1 and D6.4 M21 specification of end-user applications. The current release is comprised of two set of tools: 1. Shared Space Tool The shared space and the accompanying support material can be found on the Internet at: http://2d.mobile.evtek.fi:8080/shared-space 2. Map-It. The installer program for Map-It v2.0.0 is available at: http://www.kp-lab.org/intranet/testable-tools/kp-lab-tools/map-it/map-it-2-0.0 Please consult the "Getting Started" Note before installing and using Map-It: http://www.kp-lab.org/intranet/testable-tools/kp-lab-tools/map-it/getting-started-with-map-it 3. Change Laboratory tools The release targeted for the end users participating in the trials planned to be conducted in the CL Working Knot can be accessed via the following link: http://2d.mobile.evtek.fi:8080/shared-space/cl.html Anyone who wishes to try the software out but is not participating in the Change Laboratory trials should use the development deployment on: http://mielikki.mobile.evtek.fi/shared-space/cl.html The M24 release of Semantic Multimedia Annotation tools is still delayed. The release of CASS Memo Client has been postponed to be included in the M28 release in DoW3

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

A MODEL FOR BUILDING SKILLS AND KNOWLEDGE NEEDED IN THE JOB MARKET

Author: Aunimo Lili
Huttunen Salla
Publication venue: 'IATED Academy'
Publication date: 01/01/2020
Field of study

Finnish IT companies are facing a shortage of software engineers in several fields of software development. The field evolves quickly as new technologies emerge, as processing power of computers grows and as data available for processing becomes abundant. How can a university of applied sciences keep its teaching relevant from the point of view of companies that need personnel with new skills and knowledge? How do the teachers keep their own professional knowledge and skills up to date to be able to pass the knowledge and skills on to their students? How do the educational institutions know what kind of skills and knowledge the employers need in the first place? This paper presents a model for tackling the above-mentioned challenges in the context of university level education of future and current IT-professionals. The model has been tested in the teaching of artificial intelligence to undergraduate students of business information technology. Experiences from the implementations of the model have been gathered and analysed. The results show that the model clearly is a success among all stakeholders. The main novelty of the model is that it allows the dynamic and timely adjustment of curricula when new skill and knowledge requirements arise from the industry

Crossref

Theseus

Exploiting User-Generated Content for Service Improvement: Case Airport Twitter Data

Author: Aunimo Lili
Martin-Domingo Luis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

The study illustrates how airport collaborative networks can profit from the richness of data, now available due to digitalization. Using a co-creation process, where the passenger generated content is leveraged to identify possible service improvement areas. A Twitter dataset of 949497 tweets is analyzed from the four years period 2018-2021 – with the second half falling under COVID period - for 100 airports. The Latent Dirichlet Allocation (LDA) method was used for topic discovery and the lexicon-based method for sentiment analysis of the tweets. The COVID-19 related tweets reported a lower sentiment by passengers, which can be an indication of lower service level perceived. The research successfully created and tested a methodology for leveraging user-generated content for identifying possible service improvement areas in an ecosystem of services. One of the outputs of the methodology is a list of COVID-19 terms in the airport context

Theseus

OPEN DATA COURSE PROJECTS IN BUSINESS INTELLIGENCE

Author: Aunimo Lili
Kauppinen Raine
Kekkonen Hami
Publication venue: 'IATED Academy'
Publication date: 01/01/2020
Field of study

This paper discusses the exploitation of open data in course projects for business information students. Open data has been used for several years in teaching business intelligence for second-year students of business information technology. Alternatives for using open data would be the use of teacher-created data, educational example data from a commercial software provider or proprietary data from a company. All of the aforementioned data types prove to be challenging when the goal is to offer students project-based learning with authentic and reuse permitting data. This paper provides a description of a model for a course on business intelligence that extensively uses open data. It presents an example of a successful course project implementing the model. Lastly, the experiences and feedback from all stakeholders that took part in the course implementation are presented

Theseus

RPA Experiments in SMEs Through a Collaborative Network

Author: Aunimo Lili
Kedziora Damian
Kortesalmi Heli
Publication venue: Springer
Publication date: 01/01/2023
Field of study

Robotic Process Automation (RPA) technology has been widely applied in many types of organizations. It is embraced in the hope of increased productivity, quality, and employee satisfaction. Intelligent Automation, being its further enhancement will strongly impact Society 5.0, as it drives productivity through digital technologies, contributing to a more human-friendly working environment. Our research describes the implementation of RPA in financial processes at several SMEs in Finland, as well as the construction of a collaborative network for building and sharing expertise on RPA. With our multiple case study, we explored the main drivers of RPA implementation at SMEs. The research was conducted among 12 SMEs and numerous other organizations participating in a collaborative network for RPA. We conclude that the desire to automate routine work to increase satisfaction at work is the main driver behind RPA implementation in SMEs

Theseus